EDA - Exploratory Data Analysis¶

importing data¶

Loaded df_appearances from ../pickles/df_appearances.pkl
Skipping player_performance...
Skipping player_game_team_mapping...
Skipping df_games_odds...
Loaded df_teamstats from ../pickles/df_teamstats.pkl
Loaded df_shots from ../pickles/df_shots.pkl
Loaded gameresult from ../pickles/gameresult.pkl
Loaded df_after_outliers_missing from ../pickles/df_after_outliers_missing.pkl
Skipping team_performance...
Loaded df_with_categories from ../pickles/df_with_categories.pkl
Loaded df_num_after_EDA from ../pickles/df_num_after_EDA.pkl
Skipping df_games...
Loaded manipulated_data_no_outleirs from ../pickles/manipulated_data_no_outleirs.pkl
Loaded player_shots from ../pickles/player_shots.pkl
Loaded df_after_EDA from ../pickles/df_after_EDA.pkl
Loaded teamstats from ../pickles/teamstats.pkl
Loaded df_combined from ../pickles/df_combined.pkl

descriptive statistics¶

gameID leagueID season date homeTeamID awayTeamID homeGoals awayGoals homeGoalsHalfTime awayGoalsHalfTime home_xGoals home_shots home_shotsOnTarget home_deep home_ppda home_fouls home_corners home_yellowCards home_redCards home_total_assists home_total_xAssists home_total_key_passes home_total_xGoalsChain home_total_xGoalsBuildup home_total_yellow_cards home_total_red_cards home_total_blocked_shots home_total_saved_shots away_xGoals away_shots away_shotsOnTarget away_deep away_ppda away_fouls away_corners away_yellowCards away_redCards away_total_assists away_total_xAssists away_total_key_passes away_total_xGoalsChain away_total_xGoalsBuildup away_total_yellow_cards away_total_red_cards away_total_blocked_shots away_total_saved_shots gameresult
0 81 1 2015 2015-08-08 15:45:00 89 82 1 0 1 0 0.627539 9 1 4 13.8261 12 1 2.0 0 0 0.284979 5 1.396328 0.994160 2 0 4.0 1.0 0.674600 9 4 10 8.2188 12 2 3.0 0 0 0.586365 7 1.745371 0.811549 3 0 3.0 4.0 H
1 82 1 2015 2015-08-08 18:00:00 73 71 0 1 0 0 0.876106 11 2 11 6.9000 13 6 3.0 0 0 0.419975 9 2.159510 1.170894 3 0 2.0 2.0 0.782253 7 3 2 11.8462 13 3 4.0 0 1 0.560695 4 1.238205 0.736815 4 0 2.0 2.0 A
2 83 1 2015 2015-08-08 18:00:00 72 90 2 2 0 1 0.604226 10 5 5 6.6500 7 8 1.0 0 2 0.549139 8 1.025550 0.493522 1 0 2.0 3.0 0.557892 11 5 4 17.1579 13 2 2.0 0 1 0.418385 8 1.959323 1.030588 2 0 3.0 3.0 D
3 84 1 2015 2015-08-08 18:00:00 75 77 4 2 3 0 2.568030 19 8 5 10.8800 13 6 2.0 0 2 1.727543 18 6.815649 3.741916 2 0 4.0 4.0 1.459460 11 5 6 9.5556 17 3 4.0 0 2 1.288886 9 7.622863 5.617276 4 0 2.0 3.0 H
4 85 1 2015 2015-08-08 18:00:00 79 78 1 3 0 1 1.130760 17 6 5 5.7368 14 1 1.0 0 1 0.416638 12 1.966623 0.699249 1 0 3.0 4.0 2.109750 11 7 10 10.6250 20 4 0.0 0 3 2.050685 10 10.799517 8.554974 0 0 2.0 4.0 A
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12675 16131 5 2020 2021-05-23 19:00:00 168 166 1 2 1 1 1.411190 15 5 17 12.3684 8 9 2.0 0 1 0.971853 11 3.853730 1.999150 2 0 6.0 4.0 1.707510 8 5 3 8.3529 11 5 2.0 0 1 0.307960 4 1.223212 0.715843 2 0 1.0 3.0 A
12676 16132 5 2020 2021-05-23 19:00:00 177 176 1 2 1 1 1.198190 10 3 3 16.2632 11 5 1.0 0 1 0.855524 8 1.962812 1.028432 1 0 3.0 2.0 1.238050 12 5 4 27.0000 6 2 1.0 0 1 0.775388 7 2.610665 1.758012 1 0 4.0 3.0 A
12677 16133 5 2020 2021-05-23 19:00:00 163 235 2 0 1 0 1.332690 12 6 10 8.2857 11 4 1.0 0 1 1.151649 8 7.684589 5.704923 1 0 2.0 4.0 0.357583 9 2 0 39.7273 10 3 0.0 0 0 0.216965 6 0.884652 0.544502 0 0 0.0 2.0 H
12678 16134 5 2020 2021-05-23 19:00:00 175 181 0 1 0 1 1.460500 19 5 6 7.5600 13 9 1.0 0 0 1.265829 13 4.790546 3.092978 1 0 5.0 5.0 1.380290 10 2 3 14.7200 10 3 0.0 0 1 0.565077 6 1.256511 0.764512 0 0 1.0 1.0 A
12679 16135 5 2020 2021-05-23 19:00:00 225 179 1 1 1 0 0.323960 6 2 1 15.1000 17 2 1.0 0 0 0.074636 4 0.528499 0.205685 1 0 0.0 1.0 0.521913 7 1 0 15.9524 9 3 1.0 0 1 0.470476 4 0.502347 0.421488 1 0 2.0 0.0 D

12680 rows × 47 columns

Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: Autoviz in /home/leoadmin/.local/lib/python3.8/site-packages (0.1.902)
Requirement already satisfied: emoji in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (2.14.1)
Requirement already satisfied: fsspec>=0.8.3 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (2025.3.0)
Requirement already satisfied: holoviews<=1.14.9 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.14.9)
Requirement already satisfied: hvplot~=0.7.3 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (0.7.3)
Requirement already satisfied: matplotlib<=3.7.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (3.7.4)
Requirement already satisfied: nltk in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (3.9.1)
Requirement already satisfied: numpy<1.24 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.23.5)
Requirement already satisfied: pandas-dq>=1.29 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.29)
Requirement already satisfied: pandas<2.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.5.3)
Requirement already satisfied: panel~=0.14.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (0.14.4)
Requirement already satisfied: param==1.13.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.13.0)
Requirement already satisfied: pyamg in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (5.1.0)
Requirement already satisfied: scikit-learn in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.3.2)
Requirement already satisfied: seaborn<=0.12.2 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (0.12.2)
Requirement already satisfied: statsmodels in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (0.14.1)
Requirement already satisfied: textblob in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (0.18.0.post0)
Requirement already satisfied: typing-extensions>=4.1.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (4.12.2)
Requirement already satisfied: wordcloud in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.9.4)
Requirement already satisfied: xgboost<1.7,>=0.82 in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (1.6.2)
Requirement already satisfied: xlrd in /home/leoadmin/.local/lib/python3.8/site-packages (from Autoviz) (2.0.1)
Requirement already satisfied: pyviz-comms>=0.7.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from holoviews<=1.14.9->Autoviz) (3.0.4)
Requirement already satisfied: colorcet in /home/leoadmin/.local/lib/python3.8/site-packages (from holoviews<=1.14.9->Autoviz) (3.1.0)
Requirement already satisfied: packaging in /home/leoadmin/.local/lib/python3.8/site-packages (from holoviews<=1.14.9->Autoviz) (24.2)
Requirement already satisfied: bokeh>=1.0.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from hvplot~=0.7.3->Autoviz) (2.4.3)
Requirement already satisfied: contourpy>=1.0.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (4.56.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (1.4.7)
Requirement already satisfied: pillow>=6.2.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (2.9.0.post0)
Requirement already satisfied: importlib-resources>=3.2.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from matplotlib<=3.7.4->Autoviz) (6.4.5)
Requirement already satisfied: pytz>=2020.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from pandas<2.0->Autoviz) (2025.1)
Requirement already satisfied: markdown in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (3.7)
Requirement already satisfied: requests in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (2.32.3)
Requirement already satisfied: tqdm>=4.48.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (4.67.1)
Requirement already satisfied: pyct>=0.4.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (0.5.0)
Requirement already satisfied: bleach in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (6.1.0)
Requirement already satisfied: setuptools>=42 in /home/leoadmin/.local/lib/python3.8/site-packages (from panel~=0.14.4->Autoviz) (75.3.0)
Requirement already satisfied: scipy>=1.5.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from scikit-learn->Autoviz) (1.10.1)
Requirement already satisfied: joblib>=1.1.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from scikit-learn->Autoviz) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from scikit-learn->Autoviz) (3.5.0)
Requirement already satisfied: click in /usr/lib/python3/dist-packages (from nltk->Autoviz) (7.0)
Requirement already satisfied: regex>=2021.8.3 in /home/leoadmin/.local/lib/python3.8/site-packages (from nltk->Autoviz) (2024.11.6)
Requirement already satisfied: patsy>=0.5.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from statsmodels->Autoviz) (1.0.1)
Requirement already satisfied: Jinja2>=2.9 in /home/leoadmin/.local/lib/python3.8/site-packages (from bokeh>=1.0.0->hvplot~=0.7.3->Autoviz) (3.1.6)
Requirement already satisfied: PyYAML>=3.10 in /usr/lib/python3/dist-packages (from bokeh>=1.0.0->hvplot~=0.7.3->Autoviz) (5.3.1)
Requirement already satisfied: tornado>=5.1 in /home/leoadmin/.local/lib/python3.8/site-packages (from bokeh>=1.0.0->hvplot~=0.7.3->Autoviz) (6.4.2)
Requirement already satisfied: zipp>=3.1.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib<=3.7.4->Autoviz) (3.20.2)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib<=3.7.4->Autoviz) (1.14.0)
Requirement already satisfied: webencodings in /home/leoadmin/.local/lib/python3.8/site-packages (from bleach->panel~=0.14.4->Autoviz) (0.5.1)
Requirement already satisfied: importlib-metadata>=4.4 in /home/leoadmin/.local/lib/python3.8/site-packages (from markdown->panel~=0.14.4->Autoviz) (8.5.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/leoadmin/.local/lib/python3.8/site-packages (from requests->panel~=0.14.4->Autoviz) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->panel~=0.14.4->Autoviz) (2.8)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->panel~=0.14.4->Autoviz) (1.25.8)
Requirement already satisfied: certifi>=2017.4.17 in /home/leoadmin/.local/lib/python3.8/site-packages (from requests->panel~=0.14.4->Autoviz) (2025.1.31)
Requirement already satisfied: MarkupSafe>=2.0 in /home/leoadmin/.local/lib/python3.8/site-packages (from Jinja2>=2.9->bokeh>=1.0.0->hvplot~=0.7.3->Autoviz) (2.1.5)
Shape of your Data Set loaded: (12680, 47)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  16
    Number of Integer-Categorical Columns =  27
    Number of String-Categorical Columns =  1
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  1
    Number of Date Time Columns =  1
    Number of ID Columns =  1
    Number of Columns to Delete =  0
    47 Predictors classified...
        1 variable(s) removed since they were ID or low-information variables
        List of variables removed: ['gameID']
16 numeric variables in data exceeds limit, taking top 30 variables
    List of variables selected: ['home_xGoals', 'home_ppda', 'home_total_xAssists', 'home_total_xGoalsChain', 'home_total_xGoalsBuildup', 'home_total_blocked_shots', 'home_total_saved_shots', 'away_xGoals', 'away_ppda', 'away_total_xAssists', 'away_total_xGoalsChain', 'away_total_xGoalsBuildup', 'away_total_blocked_shots', 'home_yellowCards', 'away_yellowCards', 'away_total_saved_shots']
   Total columns > 30, too numerous to print.
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 100 81.000000 16135.000000 Possible ID column: drop before modeling step.
leagueID int64 0.000000 0 1.000000 5.000000 No issue
season int64 0.000000 0 2014.000000 2020.000000 Possible date-time colum: transform before modeling step.
date object 0.000000 53 No issue
homeTeamID int64 0.000000 1 71.000000 262.000000 Column has 55 outliers greater than upper bound (256.00) or lower than lower bound(8.00). Cap them or remove them.
awayTeamID int64 0.000000 1 71.000000 262.000000 Column has 55 outliers greater than upper bound (256.00) or lower than lower bound(8.00). Cap them or remove them.
homeGoals int64 0.000000 0 0.000000 10.000000 Column has 981 outliers greater than upper bound (3.50) or lower than lower bound(-0.50). Cap them or remove them.
awayGoals int64 0.000000 0 0.000000 9.000000 Column has 44 outliers greater than upper bound (5.00) or lower than lower bound(-3.00). Cap them or remove them.
homeGoalsHalfTime int64 0.000000 0 0.000000 6.000000 Column has 403 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
awayGoalsHalfTime int64 0.000000 0 0.000000 5.000000 Column has 225 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
home_xGoals float64 0.000000 NA 0.000000 6.630490 Column has 276 outliers greater than upper bound (3.76) or lower than lower bound(-0.92). Cap them or remove them.
home_shots int64 0.000000 0 0.000000 47.000000 Column has 167 outliers greater than upper bound (27.50) or lower than lower bound(-0.50). Cap them or remove them.
home_shotsOnTarget int64 0.000000 0 0.000000 18.000000 Column has 351 outliers greater than upper bound (10.50) or lower than lower bound(-1.50). Cap them or remove them., Column has a high correlation with ['home_total_saved_shots']. Consider dropping one of them.
home_deep int64 0.000000 0 0.000000 42.000000 Column has 227 outliers greater than upper bound (18.00) or lower than lower bound(-6.00). Cap them or remove them.
home_ppda float64 0.000000 NA 1.897400 97.333300 Column has 540 outliers greater than upper bound (21.31) or lower than lower bound(-1.88). Cap them or remove them.
home_fouls int64 0.000000 0 0.000000 33.000000 Column has 241 outliers greater than upper bound (22.50) or lower than lower bound(2.50). Cap them or remove them.
home_corners int64 0.000000 0 0.000000 20.000000 Column has 143 outliers greater than upper bound (13.00) or lower than lower bound(-3.00). Cap them or remove them.
home_yellowCards float64 0.007886 NA 0.000000 8.000000 1 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 37 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them.
home_redCards int64 0.000000 0 0.000000 3.000000 Column has 1078 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
home_total_assists int64 0.000000 0 0.000000 8.000000 Column has 20 outliers greater than upper bound (5.00) or lower than lower bound(-3.00). Cap them or remove them., Column has a high correlation with ['homeGoals']. Consider dropping one of them.
home_total_xAssists float64 0.000000 NA 0.000000 5.512622 Column has 346 outliers greater than upper bound (2.76) or lower than lower bound(-0.79). Cap them or remove them., Column has a high correlation with ['home_xGoals']. Consider dropping one of them.
home_total_key_passes int64 0.000000 0 0.000000 38.000000 Column has 123 outliers greater than upper bound (22.00) or lower than lower bound(-2.00). Cap them or remove them., Column has a high correlation with ['home_shots']. Consider dropping one of them.
home_total_xGoalsChain float64 0.000000 NA 0.000000 32.011994 Column has 523 outliers greater than upper bound (10.99) or lower than lower bound(-3.97). Cap them or remove them., Column has a high correlation with ['home_total_xAssists']. Consider dropping one of them.
home_total_xGoalsBuildup float64 0.000000 NA 0.000000 24.437683 Column has 664 outliers greater than upper bound (6.79) or lower than lower bound(-2.88). Cap them or remove them., Column has a high correlation with ['home_total_xGoalsChain']. Consider dropping one of them.
home_total_yellow_cards int64 0.000000 0 0.000000 8.000000 Column has 20 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them., Column has a high correlation with ['home_yellowCards']. Consider dropping one of them.
home_total_red_cards int64 0.000000 0 0.000000 3.000000 Column has 1064 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them., Column has a high correlation with ['home_redCards']. Consider dropping one of them.
home_total_blocked_shots float64 0.023659 NA 0.000000 19.000000 3 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 230 outliers greater than upper bound (9.50) or lower than lower bound(-2.50). Cap them or remove them.
home_total_saved_shots float64 0.023659 NA 0.000000 17.000000 3 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 532 outliers greater than upper bound (7.00) or lower than lower bound(-1.00). Cap them or remove them.
away_xGoals float64 0.000000 NA 0.000000 6.186960 Column has 312 outliers greater than upper bound (3.10) or lower than lower bound(-0.91). Cap them or remove them.
away_shots int64 0.000000 0 0.000000 39.000000 Column has 161 outliers greater than upper bound (23.00) or lower than lower bound(-1.00). Cap them or remove them.
away_shotsOnTarget int64 0.000000 0 0.000000 15.000000 Column has 233 outliers greater than upper bound (9.50) or lower than lower bound(-2.50). Cap them or remove them., Column has a high correlation with ['away_total_saved_shots']. Consider dropping one of them.
away_deep int64 0.000000 0 0.000000 28.000000 Column has 423 outliers greater than upper bound (13.00) or lower than lower bound(-3.00). Cap them or remove them.
away_ppda float64 0.000000 NA 2.122000 152.000000 Column has 606 outliers greater than upper bound (24.25) or lower than lower bound(-2.58). Cap them or remove them.
away_fouls int64 0.000000 0 0.000000 32.000000 Column has 81 outliers greater than upper bound (25.00) or lower than lower bound(1.00). Cap them or remove them.
away_corners int64 0.000000 0 0.000000 19.000000 Column has 283 outliers greater than upper bound (10.50) or lower than lower bound(-1.50). Cap them or remove them.
away_yellowCards float64 0.000000 NA 0.000000 9.000000 Column has 43 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them.
away_redCards int64 0.000000 0 0.000000 3.000000 Column has 1396 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
away_total_assists int64 0.000000 0 0.000000 8.000000 Column has 790 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them., Column has a high correlation with ['awayGoals']. Consider dropping one of them.
away_total_xAssists float64 0.000000 NA 0.000000 5.463750 Column has 389 outliers greater than upper bound (2.29) or lower than lower bound(-0.77). Cap them or remove them., Column has a high correlation with ['away_xGoals']. Consider dropping one of them.
away_total_key_passes int64 0.000000 0 0.000000 27.000000 Column has 169 outliers greater than upper bound (18.50) or lower than lower bound(-1.50). Cap them or remove them., Column has a high correlation with ['away_shots']. Consider dropping one of them.
away_total_xGoalsChain float64 0.000000 NA 0.000000 34.587459 Column has 538 outliers greater than upper bound (9.16) or lower than lower bound(-3.62). Cap them or remove them., Column has a high correlation with ['away_total_xAssists']. Consider dropping one of them.
away_total_xGoalsBuildup float64 0.000000 NA 0.000000 27.419105 Column has 718 outliers greater than upper bound (5.57) or lower than lower bound(-2.50). Cap them or remove them., Column has a high correlation with ['away_total_xGoalsChain']. Consider dropping one of them.
away_total_yellow_cards int64 0.000000 0 0.000000 9.000000 Column has 26 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them., Column has a high correlation with ['away_yellowCards']. Consider dropping one of them.
away_total_red_cards int64 0.000000 0 0.000000 3.000000 Column has 1382 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them., Column has a high correlation with ['away_redCards']. Consider dropping one of them.
away_total_blocked_shots float64 0.063091 NA 0.000000 16.000000 8 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 187 outliers greater than upper bound (8.50) or lower than lower bound(-3.50). Cap them or remove them.
away_total_saved_shots float64 0.063091 NA 0.000000 13.000000 8 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 95 outliers greater than upper bound (8.50) or lower than lower bound(-3.50). Cap them or remove them.
gameresult object 0.000000 0 No issue
Number of All Scatter Plots = 136
[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gazetteers is already up-to-date!
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package movie_reviews is already up-to-date!
[nltk_data]    | Downloading package names to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package names is already up-to-date!
[nltk_data]    | Downloading package shakespeare to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package shakespeare is already up-to-date!
[nltk_data]    | Downloading package stopwords to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package stopwords is already up-to-date!
[nltk_data]    | Downloading package treebank to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package treebank is already up-to-date!
[nltk_data]    | Downloading package twitter_samples to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package twitter_samples is already up-to-date!
[nltk_data]    | Downloading package omw to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw is already up-to-date!
[nltk_data]    | Downloading package omw-1.4 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw-1.4 is already up-to-date!
[nltk_data]    | Downloading package wordnet to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet is already up-to-date!
[nltk_data]    | Downloading package wordnet2021 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet2021 is already up-to-date!
[nltk_data]    | Downloading package wordnet31 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet31 is already up-to-date!
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet_ic is already up-to-date!
[nltk_data]    | Downloading package words to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package words is already up-to-date!
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package maxent_ne_chunker is already up-to-date!
[nltk_data]    | Downloading package punkt to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package punkt is already up-to-date!
[nltk_data]    | Downloading package snowball_data to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package snowball_data is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | 
[nltk_data]  Done downloading collection popular
Could not draw wordcloud plot for date
All Plots done
Time to run AutoViz = 273 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Running AutoViz on df_appearances ===
    max_rows_analyzed is smaller than dataset shape 356513...
        randomly sampled 150000 rows from read CSV file
Shape of your Data Set loaded: (150000, 18)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  4
    Number of Integer-Categorical Columns =  9
    Number of String-Categorical Columns =  0
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  5
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  0
    Number of Date Time Columns =  0
    Number of ID Columns =  0
    Number of Columns to Delete =  0
    18 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
Since Number of Rows in data 150000 exceeds maximum, randomly sampling 150000 rows for EDA...
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 8 81.000000 16135.000000 No issue
playerID int64 0.000000 4 1.000000 9567.000000 Column has 3 outliers greater than upper bound (9547.50) or lower than lower bound(-4192.50). Cap them or remove them.
goals int64 0.000000 0 0.000000 5.000000 Column has 12763 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
ownGoals int64 0.000000 0 0.000000 2.000000 Column has 415 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
shots int64 0.000000 0 0.000000 14.000000 Column has 15692 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
xGoals float64 0.000000 NA 0.000000 3.271321 Column has 19761 outliers greater than upper bound (0.19) or lower than lower bound(-0.12). Cap them or remove them.
xGoalsChain float64 0.000000 NA 0.000000 5.052705 Column has 9034 outliers greater than upper bound (0.90) or lower than lower bound(-0.51). Cap them or remove them.
xGoalsBuildup float64 0.000000 NA 0.000000 3.465713 Column has 18971 outliers greater than upper bound (0.41) or lower than lower bound(-0.25). Cap them or remove them.
assists int64 0.000000 0 0.000000 4.000000 Column has 9165 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
keyPasses int64 0.000000 0 0.000000 11.000000 Column has 9392 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
xAssists float64 0.000000 NA 0.000000 3.074537 Column has 18890 outliers greater than upper bound (0.14) or lower than lower bound(-0.08). Cap them or remove them.
positionOrder int64 0.000000 0 1.000000 17.000000 No issue
yellowCard int64 0.000000 0 0.000000 1.000000 No issue
redCard int64 0.000000 0 0.000000 1.000000 No issue
time int64 0.000000 0 1.000000 90.000000 Column has 9880 outliers greater than upper bound (138.00) or lower than lower bound(10.00). Cap them or remove them.
subOut int64 0.000000 0 0.000000 1.000000 No issue
subIn int64 0.000000 0 0.000000 1.000000 No issue
leagueID int64 0.000000 0 1.000000 5.000000 No issue
Number of All Scatter Plots = 10
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
All Plots done
Time to run AutoViz = 23 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Completed AutoViz on df_appearances ===

=== Running AutoViz on df_teamstats ===
Shape of your Data Set loaded: (25360, 16)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  3
    Number of Integer-Categorical Columns =  9
    Number of String-Categorical Columns =  1
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  1
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  1
    Number of Date Time Columns =  1
    Number of ID Columns =  0
    Number of Columns to Delete =  0
    16 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 50 81.000000 16135.000000 No issue
teamID int64 0.000000 0 71.000000 262.000000 Column has 110 outliers greater than upper bound (256.00) or lower than lower bound(8.00). Cap them or remove them.
season int64 0.000000 0 2014.000000 2020.000000 Possible date-time colum: transform before modeling step.
date object 0.000000 26 No issue
location object 0.000000 0 No issue
goals int64 0.000000 0 0.000000 10.000000 Column has 156 outliers greater than upper bound (5.00) or lower than lower bound(-3.00). Cap them or remove them.
xGoals float64 0.000000 NA 0.000000 6.630490 Column has 556 outliers greater than upper bound (3.48) or lower than lower bound(-0.97). Cap them or remove them.
shots int64 0.000000 0 0.000000 47.000000 Column has 263 outliers greater than upper bound (26.50) or lower than lower bound(-1.50). Cap them or remove them.
shotsOnTarget int64 0.000000 0 0.000000 18.000000 Column has 469 outliers greater than upper bound (10.50) or lower than lower bound(-1.50). Cap them or remove them.
deep int64 0.000000 0 0.000000 42.000000 Column has 722 outliers greater than upper bound (15.50) or lower than lower bound(-4.50). Cap them or remove them.
ppda float64 0.000000 NA 1.897400 152.000000 Column has 1177 outliers greater than upper bound (22.85) or lower than lower bound(-2.34). Cap them or remove them.
fouls int64 0.000000 0 0.000000 33.000000 Column has 139 outliers greater than upper bound (25.00) or lower than lower bound(1.00). Cap them or remove them.
corners int64 0.000000 0 0.000000 20.000000 Column has 178 outliers greater than upper bound (13.00) or lower than lower bound(-3.00). Cap them or remove them.
yellowCards float64 0.003943 NA 0.000000 9.000000 1 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 80 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them.
redCards int64 0.000000 0 0.000000 3.000000 Column has 2474 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
result object 0.000000 0 No issue
Number of All Scatter Plots = 6
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gazetteers is already up-to-date!
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package movie_reviews is already up-to-date!
[nltk_data]    | Downloading package names to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package names is already up-to-date!
[nltk_data]    | Downloading package shakespeare to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package shakespeare is already up-to-date!
[nltk_data]    | Downloading package stopwords to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package stopwords is already up-to-date!
[nltk_data]    | Downloading package treebank to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package treebank is already up-to-date!
[nltk_data]    | Downloading package twitter_samples to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package twitter_samples is already up-to-date!
[nltk_data]    | Downloading package omw to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw is already up-to-date!
[nltk_data]    | Downloading package omw-1.4 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw-1.4 is already up-to-date!
[nltk_data]    | Downloading package wordnet to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet is already up-to-date!
[nltk_data]    | Downloading package wordnet2021 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet2021 is already up-to-date!
[nltk_data]    | Downloading package wordnet31 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet31 is already up-to-date!
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet_ic is already up-to-date!
[nltk_data]    | Downloading package words to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package words is already up-to-date!
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package maxent_ne_chunker is already up-to-date!
[nltk_data]    | Downloading package punkt to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package punkt is already up-to-date!
[nltk_data]    | Downloading package snowball_data to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package snowball_data is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | 
[nltk_data]  Done downloading collection popular
Could not draw wordcloud plot for date
All Plots done
Time to run AutoViz = 20 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Completed AutoViz on df_teamstats ===

=== Running AutoViz on df_shots ===
    max_rows_analyzed is smaller than dataset shape 324543...
        randomly sampled 150000 rows from read CSV file
Shape of your Data Set loaded: (150000, 11)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  4
    Number of Integer-Categorical Columns =  3
    Number of String-Categorical Columns =  4
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  0
    Number of Date Time Columns =  0
    Number of ID Columns =  0
    Number of Columns to Delete =  0
    11 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
Since Number of Rows in data 150000 exceeds maximum, randomly sampling 150000 rows for EDA...
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 8 81.000000 16135.000000 No issue
shooterID int64 0.000000 3 3.000000 9566.000000 Column has 2332 outliers greater than upper bound (8225.50) or lower than lower bound(-3578.50). Cap them or remove them.
assisterID float64 26.037333 NA 1.000000 9526.000000 39056 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 2133 outliers greater than upper bound (8127.00) or lower than lower bound(-3553.00). Cap them or remove them.
minute int64 0.000000 0 0.000000 103.000000 No issue
situation object 0.000000 0 No issue
lastAction object 0.000000 0 24 rare categories: Too many to list. Group them into a single category or drop the categories.
shotType object 0.000000 0 1 rare categories: ['OtherBodyPart']. Group them into a single category or drop the categories.
shotResult object 0.000000 0 1 rare categories: ['OwnGoal']. Group them into a single category or drop the categories.
xGoal float64 0.000000 NA 0.000000 0.979344 Column has 20996 outliers greater than upper bound (0.20) or lower than lower bound(-0.08). Cap them or remove them.
positionX float64 0.000000 NA 0.004000 0.999000 Column has 867 outliers greater than upper bound (1.10) or lower than lower bound(0.59). Cap them or remove them.
positionY float64 0.000000 NA 0.005000 0.997000 Column has 322 outliers greater than upper bound (0.87) or lower than lower bound(0.14). Cap them or remove them.
Number of All Scatter Plots = 10
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
All Plots done
Time to run AutoViz = 16 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Completed AutoViz on df_shots ===

=== Running AutoViz on player_shots ===
    max_rows_analyzed is smaller than dataset shape 324543...
        randomly sampled 150000 rows from read CSV file
Shape of your Data Set loaded: (150000, 11)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  4
    Number of Integer-Categorical Columns =  3
    Number of String-Categorical Columns =  4
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  0
    Number of Date Time Columns =  0
    Number of ID Columns =  0
    Number of Columns to Delete =  0
    11 Predictors classified...
        No variables removed since no ID or low-information variables found in data set
Since Number of Rows in data 150000 exceeds maximum, randomly sampling 150000 rows for EDA...
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 8 81.000000 16135.000000 No issue
playerID int64 0.000000 3 3.000000 9566.000000 Column has 2332 outliers greater than upper bound (8225.50) or lower than lower bound(-3578.50). Cap them or remove them.
assisterID float64 26.037333 NA 1.000000 9526.000000 39056 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 2133 outliers greater than upper bound (8127.00) or lower than lower bound(-3553.00). Cap them or remove them.
minute int64 0.000000 0 0.000000 103.000000 No issue
situation object 0.000000 0 No issue
lastAction object 0.000000 0 24 rare categories: Too many to list. Group them into a single category or drop the categories.
shotType object 0.000000 0 1 rare categories: ['OtherBodyPart']. Group them into a single category or drop the categories.
shotResult object 0.000000 0 1 rare categories: ['OwnGoal']. Group them into a single category or drop the categories.
xGoal float64 0.000000 NA 0.000000 0.979344 Column has 20996 outliers greater than upper bound (0.20) or lower than lower bound(-0.08). Cap them or remove them.
positionX float64 0.000000 NA 0.004000 0.999000 Column has 867 outliers greater than upper bound (1.10) or lower than lower bound(0.59). Cap them or remove them.
positionY float64 0.000000 NA 0.005000 0.997000 Column has 322 outliers greater than upper bound (0.87) or lower than lower bound(0.14). Cap them or remove them.
Number of All Scatter Plots = 10
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
All Plots done
Time to run AutoViz = 16 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Completed AutoViz on player_shots ===

Skipping teamstats...
=== Running AutoViz on df_combined ===
Shape of your Data Set loaded: (12680, 39)
#######################################################################################
######################## C L A S S I F Y I N G  V A R I A B L E S  ####################
#######################################################################################
Classifying variables in data set...
    Number of Numeric Columns =  6
    Number of Integer-Categorical Columns =  21
    Number of String-Categorical Columns =  3
    Number of Factor-Categorical Columns =  0
    Number of String-Boolean Columns =  0
    Number of Numeric-Boolean Columns =  0
    Number of Discrete String Columns =  0
    Number of NLP String Columns =  3
    Number of Date Time Columns =  3
    Number of ID Columns =  1
    Number of Columns to Delete =  2
    39 Predictors classified...
        3 variable(s) removed since they were ID or low-information variables
        List of variables removed: ['gameID', 'home_location', 'away_location']
6 numeric variables in data exceeds limit, taking top 30 variables
    List of variables selected: ['home_xGoals', 'home_ppda', 'away_xGoals', 'away_ppda', 'home_yellowCards', 'away_yellowCards']
   Total columns > 30, too numerous to print.
To fix these data quality issues in the dataset, import FixDQ from autoviz...
    All variables classified into correct types.
  Data Type Missing Values% Unique Values% Minimum Value Maximum Value DQ Issue
gameID int64 0.000000 100 81.000000 16135.000000 Possible ID column: drop before modeling step.
leagueID int64 0.000000 0 1.000000 5.000000 No issue
season int64 0.000000 0 2014.000000 2020.000000 Possible date-time colum: transform before modeling step.
date object 0.000000 53 No issue
homeTeamID int64 0.000000 1 71.000000 262.000000 Column has 55 outliers greater than upper bound (256.00) or lower than lower bound(8.00). Cap them or remove them.
awayTeamID int64 0.000000 1 71.000000 262.000000 Column has 55 outliers greater than upper bound (256.00) or lower than lower bound(8.00). Cap them or remove them.
homeGoals int64 0.000000 0 0.000000 10.000000 Column has 981 outliers greater than upper bound (3.50) or lower than lower bound(-0.50). Cap them or remove them.
awayGoals int64 0.000000 0 0.000000 9.000000 Column has 44 outliers greater than upper bound (5.00) or lower than lower bound(-3.00). Cap them or remove them.
homeGoalsHalfTime int64 0.000000 0 0.000000 6.000000 Column has 403 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
awayGoalsHalfTime int64 0.000000 0 0.000000 5.000000 Column has 225 outliers greater than upper bound (2.50) or lower than lower bound(-1.50). Cap them or remove them.
home_season int64 0.000000 0 2014.000000 2020.000000 Possible date-time colum: transform before modeling step.
home_date object 0.000000 53 No issue
home_location object 0.000000 0 Possible Zero-variance or low information colum: drop before modeling step.
home_goals int64 0.000000 0 0.000000 10.000000 Column has 981 outliers greater than upper bound (3.50) or lower than lower bound(-0.50). Cap them or remove them., Column has a high correlation with ['homeGoals']. Consider dropping one of them.
home_xGoals float64 0.000000 NA 0.000000 6.630490 Column has 276 outliers greater than upper bound (3.76) or lower than lower bound(-0.92). Cap them or remove them.
home_shots int64 0.000000 0 0.000000 47.000000 Column has 167 outliers greater than upper bound (27.50) or lower than lower bound(-0.50). Cap them or remove them.
home_shotsOnTarget int64 0.000000 0 0.000000 18.000000 Column has 351 outliers greater than upper bound (10.50) or lower than lower bound(-1.50). Cap them or remove them.
home_deep int64 0.000000 0 0.000000 42.000000 Column has 227 outliers greater than upper bound (18.00) or lower than lower bound(-6.00). Cap them or remove them.
home_ppda float64 0.000000 NA 1.897400 97.333300 Column has 540 outliers greater than upper bound (21.31) or lower than lower bound(-1.88). Cap them or remove them.
home_fouls int64 0.000000 0 0.000000 33.000000 Column has 241 outliers greater than upper bound (22.50) or lower than lower bound(2.50). Cap them or remove them.
home_corners int64 0.000000 0 0.000000 20.000000 Column has 143 outliers greater than upper bound (13.00) or lower than lower bound(-3.00). Cap them or remove them.
home_yellowCards float64 0.007886 NA 0.000000 8.000000 1 missing values. Impute them with mean, median, mode, or a constant value such as 123., Column has 37 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them.
home_redCards int64 0.000000 0 0.000000 3.000000 Column has 1078 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
home_result object 0.000000 0 No issue
away_season int64 0.000000 0 2014.000000 2020.000000 Possible date-time colum: transform before modeling step.
away_date object 0.000000 53 No issue
away_location object 0.000000 0 Possible Zero-variance or low information colum: drop before modeling step.
away_goals int64 0.000000 0 0.000000 9.000000 Column has 44 outliers greater than upper bound (5.00) or lower than lower bound(-3.00). Cap them or remove them., Column has a high correlation with ['awayGoals']. Consider dropping one of them.
away_xGoals float64 0.000000 NA 0.000000 6.186960 Column has 312 outliers greater than upper bound (3.10) or lower than lower bound(-0.91). Cap them or remove them.
away_shots int64 0.000000 0 0.000000 39.000000 Column has 161 outliers greater than upper bound (23.00) or lower than lower bound(-1.00). Cap them or remove them.
away_shotsOnTarget int64 0.000000 0 0.000000 15.000000 Column has 233 outliers greater than upper bound (9.50) or lower than lower bound(-2.50). Cap them or remove them.
away_deep int64 0.000000 0 0.000000 28.000000 Column has 423 outliers greater than upper bound (13.00) or lower than lower bound(-3.00). Cap them or remove them.
away_ppda float64 0.000000 NA 2.122000 152.000000 Column has 606 outliers greater than upper bound (24.25) or lower than lower bound(-2.58). Cap them or remove them.
away_fouls int64 0.000000 0 0.000000 32.000000 Column has 81 outliers greater than upper bound (25.00) or lower than lower bound(1.00). Cap them or remove them.
away_corners int64 0.000000 0 0.000000 19.000000 Column has 283 outliers greater than upper bound (10.50) or lower than lower bound(-1.50). Cap them or remove them.
away_yellowCards float64 0.000000 NA 0.000000 9.000000 Column has 43 outliers greater than upper bound (6.00) or lower than lower bound(-2.00). Cap them or remove them.
away_redCards int64 0.000000 0 0.000000 3.000000 Column has 1396 outliers greater than upper bound (0.00) or lower than lower bound(0.00). Cap them or remove them.
away_result object 0.000000 0 No issue
gameresult object 0.000000 0 No issue
Number of All Scatter Plots = 21
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gazetteers is already up-to-date!
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package movie_reviews is already up-to-date!
[nltk_data]    | Downloading package names to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package names is already up-to-date!
[nltk_data]    | Downloading package shakespeare to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package shakespeare is already up-to-date!
[nltk_data]    | Downloading package stopwords to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package stopwords is already up-to-date!
[nltk_data]    | Downloading package treebank to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package treebank is already up-to-date!
[nltk_data]    | Downloading package twitter_samples to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package twitter_samples is already up-to-date!
[nltk_data]    | Downloading package omw to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw is already up-to-date!
[nltk_data]    | Downloading package omw-1.4 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package omw-1.4 is already up-to-date!
[nltk_data]    | Downloading package wordnet to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet is already up-to-date!
[nltk_data]    | Downloading package wordnet2021 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet2021 is already up-to-date!
[nltk_data]    | Downloading package wordnet31 to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet31 is already up-to-date!
[nltk_data]    | Downloading package wordnet_ic to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package wordnet_ic is already up-to-date!
[nltk_data]    | Downloading package words to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package words is already up-to-date!
[nltk_data]    | Downloading package maxent_ne_chunker to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package maxent_ne_chunker is already up-to-date!
[nltk_data]    | Downloading package punkt to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package punkt is already up-to-date!
[nltk_data]    | Downloading package snowball_data to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package snowball_data is already up-to-date!
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /home/leoadmin/nltk_data...
[nltk_data]    |   Package averaged_perceptron_tagger is already up-
[nltk_data]    |       to-date!
[nltk_data]    | 
[nltk_data]  Done downloading collection popular
Could not draw wordcloud plot for date
Could not draw wordcloud plot for home_date
Could not draw wordcloud plot for away_date
All Plots done
Time to run AutoViz = 54 seconds 

 ###################### AUTO VISUALIZATION Completed ########################
=== Completed AutoViz on df_combined ===

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image